CETUS - A Baseline Approach to Type Extraction

نویسندگان

  • Michael Röder
  • Ricardo Usbeck
  • René Speck
  • Axel-Cyrille Ngonga Ngomo
چکیده

The concurrent growth of the Document Web and the Data Web demands accurate information extraction tools to bridge the gap between the two. In particular, the extraction of knowledge on real-world entities is indispensable to populate knowledge bases on the Web of Data. Here, we focus on the recognition of types for entities to populate knowledge bases and enable subsequent knowledge extraction steps. We present CETUS, a baseline approach to entity type extraction. CETUS is based on a three-step pipeline comprising (i) offline, knowledge-driven type pattern extraction from natural-language corpora based on grammar-rules, (ii) an analysis of input text to extract types and (iii) the mapping of the extracted type evidence to a subset of the DOLCE+DnS Ultra Lite ontology classes. We implement and compare two approaches for the third step using the YAGO ontology as well as the FOX entity recognition tool.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Local and Non-Local Methods in Covariance Matrix Estimation by Using Multi-baseline SAR Interferometry and Height Extraction for Principal Components with Maximum Likelihood Approach

By today, the technology of synthetic aperture radar (SAR) interferometry (InSAR) has been largely exploited in digital elevation model (DEM) generation and deformation mapping. Conventional InSAR technique exploits two SAR images acquired from slightly different angles, in which the information of elevation and deformation can be captured through processing of the phase difference of the image...

متن کامل

A Joint Identification Approach for Argumentative Writing Revisions

Prior work on revision identification typically uses a pipeline method: revision extraction is first conducted to identify the locations of revisions and revision classification is then conducted on the identified revisions. Such a setting propagates the errors of the revision extraction step to the revision classification step. This paper proposes an approach that identifies the revision locat...

متن کامل

Extending a source-to-source compiler with XML capabilities

This paper presents an extension that adds XML capabilities to Cetus, a source-to-source compiler developed by Purdue University. In this work, the Cetus Intermediate Representation is converted into an XML DOM tree that, in turn, enables XML capabilities, such as searching speci c code features through XPath expressions. As an example, we write an XPath code to nd private and shared variables ...

متن کامل

Cetus - An Extensible Compiler Infrastructure for Source-to-Source Transformation

Cetus is a compiler infrastructure for the source-to-source transformation of programs. We created Cetus out of the need for a compiler research environment that facilitates the development of interprocedural analysis and parallelization techniques for C, C++, and Java programs. We will describe our rationale for creating a new compiler infrastructure and give an overview of the Cetus architect...

متن کامل

Inside the whale: the structure and dynamics of the isolated Cetus dwarf spheroidal

This paper presents a study of the Cetus dwarf, an isolated dwarf galaxy within the Local Group. A matched-filter analysis of the INT/WFC imaging of this system reveals no evidence for significant tidal debris that could have been torn off the galaxy, bolstering the hypothesis that Cetus has never significantly interacted with either the Milky Way or M31. Additionally, Keck/Deimos spectroscopic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015